Latest Update:2021-12-07 18:08:16

Introduction

Data Background

This dataset was from National Vital Statistics System and focused on heart disease mortality data in US during 2014. The data was collected in county level. Here were the basic information of this dataset:

  • 2013 to 2015, 3-year average. Rates are age-standardized. County rates are spatially smoothed. The data can be viewed by gender and race/ethnicity. Data source: National Vital Statistics System. Additional data, maps, and methodology can be viewed on the Interactive Atlas of Heart Disease and Stroke http://www.cdc.gov/dhdsp/maps/atlas

Main question: How gender and races associate with heart disease death rate in CA during 2014?

Sub-question

  • what was the association between gender and heart disease death rate in California?
  • what was the association between race and heart disease death rate in California?
  • Which county had relatively higher heart disease death rate within gender stratification?
  • which county had relatively higher heart disease death rate within race stratification?

Methods

The data was obtained from CDC chronic disease and health promotion data & indicators: https://chronicdata.cdc.gov/Heart-Disease-Stroke-Prevention/Heart-Disease-Mortality-Data-Among-US-Adults-35-by/i2vk-mgdh

Data variables included:

  • Year: Center of 3-year average
  • LocationAbbr: State, Territory, or US postal abbreviation
  • LocationDesc: county name
  • GeographicLevel: county/state
  • DataSource
  • Class: Cardiovascular Diseases
  • Topic: Heart Disease Mortality
  • Data_Value: heart disease death rate
  • Data_Value_Unit: per 100,000 population
  • Data_Value_Type: Age-adjusted, Spatially Smoothed, 3-year Average Rate
  • Data_Value_Footnote_Symbol
  • Data_Value_Footnote
  • StratificationCategory1: gender
  • Stratification1: gender categories
  • StratificationCategory2: race
  • Stratification2: race categories (White, Black Hispanic, Asian and Pacific Islander, American Indian and Alaskan Native)
  • TopicID
  • LocationID
  • FIPS code
  • Location 1: lat&lon

Read In Data

# download and read in the data
if (!file.exists("Heart_Disease_Mortality_Data_Among_US_Adults__35___by_State_Territory_and_County.csv")) {
download.file(
  url = "https://chronicdata.cdc.gov/api/views/i2vk-mgdh/rows.csv", 
  destfile = "Heart_Disease_Mortality_Data_Among_US_Adults__35___by_State_Territory_and_County.csv",
  method="libcurl", 
  timeout = 60
)}
heartdisease <- data.table::fread("Heart_Disease_Mortality_Data_Among_US_Adults__35___by_State_Territory_and_County.csv")

Data Wrangling

#knitr::kable(summary(is.na(heartdisease)))
heartdisease$Data_Value <- heartdisease$Data_Value %>% replace_na(0)


# selec data in California
heartdisease_CA <- heartdisease[LocationAbbr == 'CA' & GeographicLevel == 'County']
# remove "()" in strings
heartdisease_CA$`Location 1` <- gsub("[()]", "", heartdisease_CA$`Location 1`)
# separate lat and lon variables
heartdisease_CA <- heartdisease_CA %>%
  separate(col = 'Location 1', into=c('lat', 'lon'), sep=',')
# convert chr to num
heartdisease_CA$Data_Value <- as.numeric(heartdisease_CA$Data_Value)
heartdisease_CA$lat <- as.numeric(heartdisease_CA$lat)
heartdisease_CA$lon <- as.numeric(heartdisease_CA$lon)

# select data under each stratification
CA_gender <- heartdisease_CA[Stratification1 != 'Overall' & Stratification2 == 'Overall']
CA_race <- heartdisease_CA[Stratification2 != 'Overall' & Stratification1 == 'Overall']

Gender Data Handling

CA_male <- CA_gender[Stratification1 == 'Male']%>% 
  select(LocationDesc, Data_Value)%>%
  rename(value_male = Data_Value)
CA_female <- CA_gender[Stratification1 == 'Female'] %>% 
  select(LocationDesc, Data_Value)%>%
  rename(value_female = Data_Value)
gender_joint <- merge(CA_male, CA_female, by.x = "LocationDesc", 
             by.y = "LocationDesc", all.x = TRUE, all.y = FALSE)
gender_joint$Gap <- (gender_joint$value_male - gender_joint$value_female)

df <- read.csv('https://raw.githubusercontent.com/kjhealy/fips-codes/master/state_and_county_fips_master.csv')
fips <- filter(df,state == "CA")

CA_gender1 <- merge(gender_joint, fips, by.x = "LocationDesc", 
             by.y = "name", all.x = TRUE, all.y = FALSE)
CA_gender1 <- CA_gender1 %>% 
  mutate(fips = ifelse(row_number()>= 1,paste0("0", fips)))
url <- 'https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json'
counties <- rjson::fromJSON(file=url)

Race Data Handling

# create race subset 
# this part wiil be updated in the final project
CA_white <- CA_race[Stratification2 == 'White']%>% 
  select(LocationDesc, Data_Value)%>% 
  rename(value_white = Data_Value)
CA_hispanic <- CA_race[Stratification2 == 'Hispanic'] %>% 
  select(LocationDesc, Data_Value) %>%
  rename(value_hispanic = Data_Value)
CA_black <- CA_race[Stratification2 == 'Black'] %>% 
  select(LocationDesc, Data_Value) %>%
  rename(value_black = Data_Value, )
CA_asian_pacific <- CA_race[Stratification2 == 'Asian and Pacific Islander'] %>% 
  select(LocationDesc, Data_Value) %>%
  rename(value_asian_pacific = Data_Value)
CA_indian_alaskan <- CA_race[Stratification2 == 'American Indian and Alaskan Native']%>% 
  select(LocationDesc, Data_Value) %>%
  rename(value_indian_alaskan = Data_Value)
data_list <- list(CA_white, CA_hispanic, CA_black, CA_asian_pacific, CA_indian_alaskan) 
CA_race1 <- data_list %>% reduce(inner_join, by = "LocationDesc")

CA_race1 <- merge(CA_race1, fips, by.x = "LocationDesc", 
             by.y = "name", all.x = TRUE, all.y = FALSE)
CA_race1 <- CA_race1 %>% 
  mutate(fips = ifelse(row_number()>= 1,paste0("0", fips)))

Preliminary Results

Mortality rate distribution within sex and race stratification in CA

# create distribution graph graph to find association between gender and death rate
p1 <- ggplot(CA_gender, aes(Data_Value, fill = Stratification1))+ 
     geom_density(alpha = 0.5) +
     scale_fill_brewer(palette = "Set3") +
     labs(
      x = "death rate per 100,000 population",
      y = "Density",
      title = "Distribution of death rate by gender in CA")
p1 <- ggplotly(p1)

# create distribution graph to find association between race and death rate
p2 <- ggplot(CA_race, aes(Data_Value, fill = Stratification2))+ 
     geom_density(alpha = 0.5) +
     scale_fill_brewer(palette = "Set3") +
     labs(
      x = "death rate per 100,000 population",
      y = "Density",
      title = "Distribution of death rate by race in CA")
p2 <- ggplotly(p2)

Mortality rate pattern under sex stratification in CA during 2014

Mortality rate pattern under race stratification in CA during 2014

Mortality rate Gap Between Male group and Female group in CA

fig_gendergap <- plot_ly(gender_joint, x = ~value_male, y = ~value_female, text = ~LocationDesc, type = 'scatter', mode = 'markers',size = ~Gap, color = ~LocationDesc, colors = 'Paired',
        sizes = c(5, 45),
        marker = list(opacity = 0.5, sizemode = 'diameter'))
fig_gendergap <- fig_gendergap %>% 
  layout(title = 'Gender Gap on heart disease death rate among CA county',
         xaxis = list(title = 'Mortality rate/100,000 population (male)', showgrid = FALSE),
         yaxis = list(title = 'Mortality rate/100,000 population (female)', showgrid = FALSE))

fig_gendergap

Mortality Rate Pattern in each county in CA (gender stratification)

## Pattern map for male group
fig_male <- plot_ly( text=~paste(paste("County: ", CA_gender1$LocationDesc),
                      paste("Death rate/100,000:", CA_gender1$value_male),
                      sep="<br>"),hoverinfo="text")
fig_male <- fig_male %>% add_trace(
    type="choroplethmapbox",
    geojson = counties,
    locations = CA_gender1$fips,
    z = CA_gender1$value_male,
    colorscale="Viridis",
    zmin = 150,
    zmax = 500,
    marker=list(line=list(
      width=0),
      opacity=0.5))%>% 
  layout(
    mapbox=list(
      style="carto-positron",
      zoom =4,
      center=list(lon= -119.42, lat=36.78)))

fig_female <- plot_ly( text=~paste(paste("County: ", CA_gender1$LocationDesc),
                      paste("Death rate/100,000:", CA_gender1$value_female),
                      sep="<br>"),hoverinfo="text")
fig_female <- fig_female %>% add_trace(
    type="choroplethmapbox",
    geojson = counties,
    locations = CA_gender1$fips,
    z = CA_gender1$value_female,
    colorscale="Viridis",
    zmin = 150,
    zmax = 500,
    marker=list(line=list(
      width=0),
      opacity=0.5))%>% 
  layout(
    mapbox=list(
      style="carto-positron",
      zoom =4,
      center=list(lon= -119.42, lat=36.78)))

Male group mortality rate in CA during 2014

Female group mortality rate in CA during 2014

Mortality Rate Pattern in Each County in CA During 2014(Race Stratification)

## Pattern map for White group
fig_white <- plot_ly(
  text=~paste(paste("County: ", CA_race1$LocationDesc),
                      paste("Death rate/100,000:", CA_race1$value_white),
                      sep="<br>"),hoverinfo="text")
fig_white <- fig_white %>% 
  add_trace(
    type="choroplethmapbox",
    geojson = counties,
    locations = CA_race1$fips,
    z = CA_race1$value_white,
    colorscale="Viridis",
    zmin = 150,
    zmax = 500,
    marker=list(line=list(
      width=0),
      opacity=0.5)) %>% 
  layout(
    mapbox=list(
      style="carto-positron",
      zoom =4,
      center=list(lon= -119.42, lat=36.78)))

# Pattern map for Hispanic group
fig_hispanic <- plot_ly(
  text=~paste(paste("County: ", CA_race1$LocationDesc),
                      paste("Death rate/100,000:", CA_race1$value_hispanic),
                      sep="<br>"),hoverinfo="text")
fig_hispanic <- fig_hispanic %>% add_trace(
    type="choroplethmapbox",
    geojson = counties,
    locations = CA_race1$fips,
    z = CA_race1$value_hispanic,
    colorscale="Viridis",
    zmin = 150,
    zmax = 500,
    marker=list(line=list(
      width=0),
      opacity=0.5))%>% 
  layout(
    mapbox=list(
      style="carto-positron",
      zoom =4,
      center=list(lon= -119.42, lat=36.78)))

# Pattern map for Black group
fig_black <- plot_ly(text=~paste(paste("County: ", CA_race1$LocationDesc),
                      paste("Death rate/100,000:", CA_race1$value_black),
                      sep="<br>"),hoverinfo="text")
fig_black <- fig_black %>% add_trace(
    type="choroplethmapbox",
    geojson = counties,
    locations = CA_race1$fips,
    z = CA_race1$value_black,
    colorscale="Viridis",
    zmin = 150,
    zmax = 500,
    marker=list(line=list(
      width=0),
      opacity=0.5))%>% 
  layout(
    mapbox=list(
      style="carto-positron",
      zoom =4,
      center=list(lon= -119.42, lat=36.78)))

# Pattern map for Asian and Pacific Islander group
fig_asian_pacific <- plot_ly(text=~paste(paste("County: ", CA_race1$LocationDesc),
                      paste("Death rate/100,000:", CA_race1$value_asian_pacific),
                      sep="<br>"),hoverinfo="text")
fig_asian_pacific <- fig_asian_pacific %>% add_trace(
    type="choroplethmapbox",
    geojson = counties,
    locations = CA_race1$fips,
    z = CA_race1$value_asian_pacific,
    colorscale="Viridis",
    zmin = 150,
    zmax = 500,
    marker=list(line=list(
      width=0),
      opacity=0.5))%>% 
  layout(
    mapbox=list(
      style="carto-positron",
      zoom =4,
      center=list(lon= -119.42, lat=36.78)))

fig_indian_alaskan <- plot_ly(text=~paste(paste("County: ", CA_race1$LocationDesc),
                      paste("Death rate/100,000:", CA_race1$value_indian_alaskan),
                      sep="<br>"),hoverinfo="text")
fig_indian_alaskan <- fig_indian_alaskan %>% add_trace(
    type="choroplethmapbox",
    geojson = counties,
    locations = CA_race1$fips,
    z = CA_race1$value_indian_alaskan,
    colorscale="Viridis",
    zmin = 150,
    zmax = 500,
    marker=list(line=list(
      width=0),
      opacity=0.5))%>% 
  layout(
    mapbox=list(
      style="carto-positron",
      zoom =4,
      center=list(lon= -119.42, lat=36.78)))

White

Hispanic

Black

Asian and Pacific Islander

American Indian and Alaskan Native

Conclusion

There were association between both gender and race stratification and heart disease death rate in California during 2014. For gender stratification, female had a lower death rate than male in general. The female in Kern County, Tulare County and Glenn County had relatively higher heart disease death rate. The male in Tulare County and Tuolumne County had relatvely higher heart disease death rate. From a overall view, Kern County and Tulare County had relatively higher heart disease death rate. The possible reason might be that different county had different medical services level. The overall trend show that counties along the coast had lower death rate, which might due to higher developmental level. For race/ethnicity level, the Black had the highest death rate. especially in Lassen County, Kings County and Tulare County. The White and American Indian and Alaska Native had middle level of death rate, but slightly higher in Stanislaus County and Shasta County correspondingly. The Asian and Pacific Islander had a lower middle death range, and slghtly higher in Mariposa County. The Hispanic had the lowest death rate, and slightly higher in Kern county. The possible reasons may be different in distribution of races, education level, income level, healthcare availability and access to medical services.

Download the Full report

The full PDF version could be Downloaded here

Citation